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Collagen is ubiquitous throughout the animal kingdom, where it comprises some 28 diverse molecules that 
form the extracellular matrix within organisms. In the 1960s, an extracorporeal animal collagen that forms 
the cocoon of a small group of hymenopteran insects was postulated. Here we categorically demonstrate that 
the larvae of a sawfly species produce silk from three small collagen proteins. The native proteins do not 
contain hydroxyproline, a post translational modification normally considered characteristic of animal 
collagens. The function of the proteins as silks explains their unusual collagen features. Recombinant 
proteins could be produced in standard bacterial expression systems and assembled into stable collagen 
molecules, opening the door to manufacture a new class of artificial collagen materials. 

Collagens are structural proteins that are abundant and ubiquitous in the connective tissue of all metazoan 
animals. Animal collagens comprise some 28 diverse molecules that are characterised by a rod-like 
structure where polyproline II -like polypeptide chains are super- coiled about a common axis to form a 
triple-helix 1 . Since the close packing between the chains is such that only glycine (Gly) can fit in the centre of the 
triple-helix, collagen-forming proteins contain characteristic tripeptide repeats (Gly-Xaa-Yaa) n , where Xaa and 
Yaa can be any residue. The structure is stabilized by a high content (—20%) of the imino acids proline (Pro) and 
hydroxyproline (Hyp). 

Whilst it is commonly regarded that collagens function within animals, the existence of an extracorporeal 
animal collagen was postulated in the 1960s when K. M. Rudall described the X-ray diffraction pattern of some 
hymenopteran insect cocoons as collagen-like 2 ' 3 . Rudall noted that "during the brilliant summer of 1959 there were 
plagues of gooseberry sawfly (Nematus ribesii)" and this enabled cocoons or silk fibres drawn from the salivary 
gland to be obtained in sufficient quantity for study 2 . The diffraction patterns suggest twisted cables of collagen 
molecules with dimensions of 3 nm diameter. Amino acid analyses found high Gly (33.6%) and Pro (10.0%) 
content consistent with collagens, as well as high Ala (12.2%) characteristic of silks. Of most interest was that Hyp, 
which is seen as a characteristic of animal collagens was absent 4 . 

Other described insect collagens are apparently homologous to molecules found in mammals. For example, a 
collagen from Drosophila melanogaster 5 is similar in structure to the type IV collagen found from hydra to 
human 6 . Type IV collagens contain a central domain of about 1200 residues containing the characteristic 
(Gly-Xaa-Yaa) n collagen repeat with high levels of Pro and Hyp, but with many interruptions (up to 25) from 
inserts of up to 30 amino acids. The collagen domain is flanked by large non- collagenous domains. Cysteine 
residues in the non- collagenous domains crosslink the proteins to generate the network structure of the extra- 
cellular matrix. Insect collagens with similarities in composition and length (—280-290 nm) to the predominant 
interstitial type I and type II animal collagens have been isolated, for example, from locust 7 and cockroach 8 . The 
mature interstitial collagen proteins contain just over 1000 residues, which primarily consist of uninterrupted 
Gly- Xaa- Yaa repeats containing around 12% Pro and 10% hydroxyproline 7 ' 8 . Rather than networks, these pro- 
teins assemble into fibrils and fibres. 

Here, we examine the cocoon silk of the willow sawfly, Nematus oligospilus, a hymenopteran species closely 
related to the sawfly species described by Rudall 2 ' 3 . The native silk structure and composition were investigated 
using wide angle X-ray scattering and amino acid analysis. A combined proteomic-transcriptomic approach was 
used to identify the silk genes. The silk proteins were expressed in Escherichia coli and the recombinant molecules 
were demonstrated to be protease resistant and have circular dichroism spectra characteristic of collagen. 
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Results 

X-ray diffraction patterns of silk. Diffraction studies using wide- 
angle X-ray scattering (WAXS) on N. oligospilus cocoon silk showed 
a 0.286 nm meridional reflection peak (Fig. 1 A) characteristic of the 
axial distance between residues along strands of the collagen triple 
helix 9 . This feature was similar to that observed by Rudall for N. 
ribesii silks 2,3 . Other observed peaks from the N. oligospilus cocoon 
silk were consistent with a structural repeat with 4.59 nm spacing in 
the axial direction (Fig. IB) and therefore were attributed to various 
harmonics of long axial repeats of a collagen structure. Equitorial 
spots at 1.2 nm, attributed to lateral spacing between triple helices by 
Rudall, were not observed in the willow sawfly WAXS and peaks at 
0.326 and 0.382 nm were not observed in the gooseberry sawfly 
(Figure 1), implying differences in structural arrangement of the 
collagen molecules within the two silks. Previously, Rudall had 
attributed a meridional arc observed at 0.465 nm to protein in a 
cross- P structure, although no other characteristic peaks of cross- (3 
structure were detected. 

Hydroxyproline (Hyp) analysis of cocoon silk. Amino acid analysis 
of the willow sawfly silk did not detect any Hyp, consistent with 
Rudall's results from gooseberry sawfly cocoons in the 1960s 2 ' 3 . 

Native silk production. The sawfly silk proteins are produced at high 
concentrations in the labial gland, a dedicated silk gland. The silk 
gland comprises a convoluted tubular structure of about 12-15 mm 
in length when not stretched, onto which a series of nodular 
structures are attached (Fig. 2). Secretory cells within these nodules 
produce the silk proteins, which are then secreted into the gland 
lumen 10 . Haematoxylin and eosin staining of the gland shows the 
protein organized within the secretory cells into tactoids 
(Supplementary Fig. 2). 

Mechanical and chemical properties of native silk. Mature silk 
fibres (2.2 ±0.4 urn diameter) were obtained directly from larvae 
that had commenced spinning their cocoon. The mechanical pro- 
perties of the silk fibres (Fig. 3) exceeded that of mammalian 
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Figure 1 | Wide angle X-ray scattering (WAXS) data from N. oligospilus 
cocoons. (A): WAXS spectrum (average of five spots). The individual 1-D 
WAXS spectra are shown in Supplementary Figure 1. (B): Proposed 
assignments of WAXS peaks from the Willow sawfly and comparison with 
peaks observed in Gooseberry sawfly (from Rudall) 2 . 




Figure 2 | Images of silk and silk gland of sawflies. (A): Silken cocoon with 
encased pupa. (B): Silk gland showing numerous nodules comprising large 
secretory cells attached to the lumen through delivery ducts. Scale bar 
1 mm. 
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Figure 3 | Mechanical properties of cocoon silk. (A): example stress-strain 
curves of three N. oligospilus silken fibres. (B): a summary of all data for 
measured fibres. 
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collagens, with breaking stresses of 322 ± 68 MPa and breaking 
strains of 34 ± 4%, compared to 100-120 MPa breaking stress and 
13% strain at break reported for other animal collagens 11 . A further 
comparison of the mechanical properties of collagen and other 
biological fibres to various synthetic materials can be found in 
Gosline et al. 12 . 

The sawfly cocoons could readily be dissolved at room temper- 
ature in common protein denaturants such as 8 M urea or 6 M 
guanidine hydrochloride. Following denaturing polyacrylamide gel 
electrophoresis analysis, solutions of solubilized cocoons resolved as 
three discrete proteins predicted to be 53, 47 and 32 kDa (Fig. 4 A, 
native silk). Under reducing conditions the protein pattern was not 
altered, indicating that disulfide bonds were not present between the 
proteins. 

Identification of silk protein sequences. The silk proteins were 
identified after matching the mass of tryptic peptides from the 
three proteins from the cocoon (Fig. 4A) to predicted tryptic 
peptide masses from proteins encoded by three cDNA populations 
identified in a silk gland cDNA library isolated from late final instar 
larvae (53 kDa: 9 peptides; 47 kDa: 9 peptides, and 32 kDa: 3 
peptides; Fig. 5). Over 70% of the sequences obtained from the 
sawfly silk gland cDNA library encoded the three silk proteins. 
The measured tryptic fragments did not match any other proteins 
in the Genbank database. The primary amino acid sequence of the 
proteins is shown in Figure 5 and described in the discussion. 

Expression and analysis of silk proteins. The ability of the sawfly 
silk proteins to fold into collagens was confirmed after recombinant 
expression of the proteins. The small size and absence of Hyp in the 
silk proteins were conducive to their recombinant expression in 
standard E. coli fermentation systems (Fig. 4 A, rSfC A-C). The 
recombinant proteins were expressed from two different vectors 
(pET and pCold) and in neither case did they migrate according to 
their predicted molecular weight or alongside their native 
equivalents. Whilst it is common for collagen proteins to migrate 
inconsistently with protein markers, the inconsistency with the 
native proteins suggests either the proteins are not fully unfolding 
in the guanidine hydrochloride solubilisation treatment or that post- 
translational modifications are occurring in the native system. The 
recombinant molecules were digested with pepsin at 20 °C, a 
technique commonly used to isolate and purify collagen molecules. 



Consistent with the collagen structure, a substantial part of each of 
the recombinantly-produced sawfly silk proteins was resistant to 
pepsin digestion, whereas the entire molecule was protease- 
sensitive at higher temperatures. The circular dichroism spectra of 
the pepsin-resistant fragments confirmed the proteins were folded 
into collagen triple-helices (Fig. 4B), with each spectrum showing 
characteristic collagen positive ellipticity around 220 nm 13 . 

Discussion 

The present study examines the silk (Fig. 2) of the willow sawfly, 
Nematus oligospilus, a hymenopteran species restricted to willows in 
temperate areas of Australia. As with all silks, the cocoon is produced 
from a concentrated protein solution that the insect accumulates in a 
dedicated silk gland. In the willow sawfly, the silk gland is derived 
from the larval labial gland, a common adaptation in insects 10 . In 
contrast to homologous glands where the silk proteins are produced 
in the anterior region and then accumulate in a posterior lumen, the 
lumen of the sawfly silk gland is ringed with secretary units along its 
entire length (Fig. 2), and the silk proteins are organized into higher 
level structures within the secretory units (Supplementary Fig. 1). 
The cocoon is made of silk-like micro-scale fibres (Fig. 2) and the X- 
ray diffraction pattern from the fibres is characteristic of a collagen 
triple helix (Fig. 1). However, the fibres dissolve readily in protein 
denaturants, a property not common to either collagens or silks 
(Fig 4A). Despite their comparatively low chemical stability, the 
mechanical properties of the silk fibres exceeds that of mammalian 
collagens (Fig. 3), with breaking stresses and breaking strains around 
three times higher than that reported for other animal collagens 11 . 
Collectively, these findings suggest the sawfly produces unusual silk 
using novel collagens. 

The solubilized cocoon contained three silk proteins (Fig. 4A) 
whose primary sequence was identified from cDNA isolated from 
a silk gland cDNA library (Fig. 5). Analysis of the protein sequences 
conclusively identified them as collagen-forming proteins: each con- 
tained a central sequence block of 60-62 contiguous Gly-Xaa-Yaa 
repeats that comprised the majority of the protein sequence. 
Interestingly, the length of the collagen domains (180 to 186 resi- 
dues) correspond to an axial length of 515 to 532 A, similar to the 
axial period of 550 A observed in negatively stained tactoids from 
gooseberry sawfly by Rudall (1967). The collagen structure in the silk 
proteins was further confirmed after the three proteins were indi- 
vidually expressed in E. coli (Fig. 4A) and purified recombinant 
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Figure 4 | Sawfly silk proteins. (A): SDS-PAGE of recombinant sawfly silk proteins alongside native silk proteins with protein marker ladder shown on 
left. (B): Circular dichroism spectra of recombinant sawfly cocoon collagens after pepsin treatment showing characteristic collagen maxima at about 
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Figure 5 | Architecture and amino acid sequences of the willow sawfly silk proteins. Tryptic peptides identified by mass spectroscopy from proteins in 
cocoon silk are underlined. The first position of each Gly-Xaa-Yaa repeat is shown in red. 



molecules were demonstrated to be protease- resistant with circular 
dichroism spectra consistent with the collagen structure (Fig. 4B). 
Comparison of the primary sequence, size and molecular organiza- 
tion (Fig. 6) of the sawfly silk proteins to animal collagen proteins 
available in the Genbank database found the silk collagens were only 
similar to each other and not to other described collagens, suggesting 
they evolved independently of other collagens and thereby constitut- 
ing a new class of animal collagens. We termed the sawfly silk 
sequences SfC A-C (Sawfly Collagen A, B or C). 

The vast majority (98%) of the triplets in the sawfly silk proteins 
are commonly found in other animal collagens, which only contain a 
few of the 400 possible combinations at any significant level 14 . Only 
2% of the triplets, Gly-Glu-Phe and Gly-Lys-Ile in SfC A, Gly- Ala- 
Met in SfC B and Gly-Asn-Gln in SfC C, are considered rare 14 . 
However, the distribution and composition of imino acids in the 
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Figure 6 | Comparison of the molecular architecture collagen-forming 
silk proteins SfC A-C to a selection of other animal collagens. These 
include an interstitial collagen al [I] ; a network forming collagen, od [IV] ; a 
beaded filament-forming collagen, al[VI]; and 2 FACIT collagens of 
different sizes, al[IX] and al[XXI]. Lines indicate comparative protein 
length and blocks indicate regions of collagen forming sequence within the 
chains. The published sequences for the collagens from the insect 
Drosophila melanogaster are highly homologous to the type IV sequences. 



sawfly silks is highly unusual for animal collagens, in that there are 
no hydroxyprolines, a post-translational modification of proline 
considered characteristic. In other animal collagens, the X-position 
in the primary sequence Gly-Xaa-Yaa triplet is commonly proline 
and the Y-position is frequently hydroxyproline. The overall high 
level of imino acids serves to correctly position the protein backbone 
for collagen folding 15 , and Hyp in the Y-position is associated with 
thermal stability 16 . A lack of hydroxyproline however, is not unique. 
Recently a new group of collagen-like proteins have been described 
from bacteria which also lack hydroxyproline 17 . A range of these 
proteins have been characterised as recombinant products 18 " 20 . 
This work has shown that these collagens were stable at around 
mammalian body temperature, 35°C-38°C, despite the lack of 
hydroxyproline, with intra-molecular ion pair formation being 
partly responsible for this stability 21,22 . 

In addition to the absence of Hyp, the proline distribution in the 
sawfly silk proteins is biased, occurring predominantly in the X- 
position (53%, 39% and 38% for SfC A-C, respectively) and rarely 
in the Y-position (6%, 4% and 8%, respectively). In other animal 
collagens, proline occupies around 28% of the X-position and Hyp 
occupies about 38% 23 . However, the total proportion of imino acids 
in the sawfly silk proteins is similar to that found in other animal 
collagens, comprising around 16% of the residues in the collagen- 
forming domains, consistent with the role for this residue in assisting 
the folding of the protein backbone into the collagen structure. 

Mechanisms to increase thermal stability in the sawfly silks, in the 
absence of Hyp in the Yaa position, include the presence of particular 
triplets that enhance collagen stability. Most notably, SfC A-C have 
7, 4 and 6 arginine residues in the Yaa position, respectively. Arginine 
in the Y-position confers triple-helix stability similar to Hyp 24 . 
However, the ecology of the animal and biology of silk production 
make the requirement for thermal stability in the collagen molecules 
less stringent than the requirement in mammals. Sawflies are poi- 
kilothermic and therefore only require the collagen to be stable at 
environmental temperatures, rather than mammalian body temper- 
ature. Furthermore, the biophysical environment of the silk proteins 
both during silk fabrication and post-fabrication serve to enhance the 
collagen molecules stability. Unlike other collagens, the silk proteins 
are maintained in a highly concentrated silk protein solution in the 
silk gland. The thermal stability of collagen rises as water is removed, 
such as when aggregates form 25,26 , and therefore concentration and 
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aggregation of the collagen molecules in the silk gland probably 
serves to enhance the molecule's thermal stability. After silk fibre 
fabrication, the collagen is dehydrated and the molecules will be 
stable at any normal environmental temperature. 

The collagen domains of the sawfly silk proteins are flanked by 
variable length glycine-rich, non-collagenous domains of 49, 103 and 
41 (N-termini) and 21, 46 and 84 (C-termini) residues for the SfC A, 
B and C chains, respectively. Repetitive motifs occurred in the C- 
terminal domain of the SfC B and SfC C proteins: eight repeats of the 
tetrapeptide Gly/Tyr-Asp-Asn-Lys in SfC B, and 14 repeats of the 
pentapeptide Gly-Tyr-Asp-Asn-Lys in SfC C. With the exception of 
the C-terminal domain of SfC C, which showed 76% similarity over 
83 residues to a Gly-Tyr-Asp-Asn-Lys repeat in an uncharacterized 
protein from the fungus Arthroderma otae, none of the terminal 
domains showed sequence similarities to known proteins. 
Generally, animal collagens contain N- and C-terminal peptides that 
are essential for folding of the triple-helix and correct fibrillogenesis 
but are then proteolytically removed and are not present in the 
mature material 27 . Several peptides corresponding to the N- and C- 
termini of the sawfly silk proteins were identified by liquid chro- 
matography- tandem mass spectroscopy of proteins solubilized from 
cocoons (Fig. 5) indicating that most of these regions, if not all are 
present in cocoons. These regions may contribute to the mechanical 
properties of the fibres. 

The data presented in this paper have demonstrated the presence 
of the collagen triple-helix structural motif in an insect silk, confirm- 
ing the initial proposal by Rudall in the 1960s 2 . Collagen is an estab- 
lished biomedical material and many tons of animal -derived collagen 
are used annually in pharmaceutical and medical applications. 
However, there have been concerns over batch-to-batch variability 
and the transmission of diseases in the materials, leading to a pref- 
erence for development of a recombinant material that can be pro- 
duced under controlled conditions 28 . Replication of other animal 
collagens in recombinant systems requires co-expression of enzymes 
capable of converting a number of the proline residues in the protein 
to Hyp 2930 and, to date, such systems have not allowed large-scale 
production of the material 31 . The discovery of relatively small col- 
lagen proteins that fold into collagen triple-helices but do not contain 
hydroxyproline, that can be fabricated into materials from concen- 
trated protein solutions, and that form strong fibres without covalent 
cross-linking provides opportunities for collagen materials to be pre- 
pared as recombinant products in E. coli. The collagen silk proteins 
therefore constitute a new class of animal collagens ideally suited to 
the development of innovative collagen-based biomedical materials. 

Methods 

Collection and culture of insects. Willow sawfly larvae (Nematus oligospilus) were 
collected from willow trees (Salix sp.) around Lake Burley Griffin (Canberra, 
Australia) during the summers of 2007-2012. Larvae were maintained in the 
laboratory on a diet of fresh willow leaves. 

Amino acid analysis. After spinning the cocoons, the sawflies were removed from the 
cocoons before metamorphosis was complete and cocoon parts not in contact with 
leaves or other substrate were selected for analysis. Six cocoons were washed in 
distilled water for 30 min three times and dried. The hydroxyproline composition of 
the washed silk was determined in duplicate after 24 h gas phase hydrolysis at 1 10° C 
using the Waters AccQTag Ultra chemistry at the Australian Proteome Analysis 
Facility Ltd (Macquarie University, Sydney). The limit of sensitivity in this analysis 
was estimated as 0.1 pmol total, corresponding to about 1 in 5000 residues. 

Wide angle X-ray scattering (WAXS). Diffraction patterns were collected at the 
SAXS/WAXS beam-line of the Australian Synchrotron. The beam-line was operated 
at a beam energy of 18 keV (wavelength of 0.6888 A) with a sample to detector 
distance of 550 mm, yielding a q-range of approximately 0.08-2.6 A" 1 . Use of a pixel 
counting detector (Pilatus- 1M, Dectris, Baden, Switzerland) and evacuated flight tube 
achieved a reasonable signal to noise ratio from the weakly scattering sample. Data 
reduction and azimuthal integration, to produce ID profiles, were achieved using 
scatterBrain (Australian Synchrotron). 

Protein solubilization and gel electrophoresis. Sawfly cocoons were incubated in 
6 M guanidine HC1 with or without 5% v/v 2-mercaptoethanol for 30 min at 95°C, 



after which no solid material remained. The guanidine HC1 was removed after 
proteins were precipitated by the addition of nine volumes of cold 100% ethanol, 
incubated at — 20°C for 1 hr, micro-centrifuged at maximum speed (15000 g) for 
15 min at 4°C, the pellet washed with 90% cold ethanol, the pellet dried and the 
proteins resuspended in SDS-PAGE running buffer as required. Solubilized protein 
solution was run on a Nu-PAGE 4-12% Bis-Tris protein gel (Life Technologies) and 
stained with Coomassie Blue R-250. 

Mechanical testing. When disturbed, final instar larvae in the process of spinning 
would rapidly mov away from stimuli leaving a single fibre that could be collected for 
mechanical testing. Fibres were mounted across a 2 mm gap on paper frames, fixed at 
either end with epoxy glue, and examined on an optical microscope to determine their 
exact gauge length and diameter (and to examine and discard any samples with 
defects before mechanical testing). Tensile measurements were carried out on an 
Instron Tensile Tester model 4501 at a rate of 2.5 mm.min" 1 . Tests were conducted in 
air at 21°C and 65% relative humidity. Data from fibres that broke at the mounting 
points were excluded. 

Silk protein gene discovery. The DNA sequence corresponding to the silk protein 
genes were obtained from cDNA isolated from a cDNA library constructed from silk 
glands of four N. oligospilus larvae according to methods described previously 32 . 
Briefly, 29.1 jag total RNA was isolated using the RNAqueous4PCR kit (Life 
Technologies), from which mRNA was isolated using the Micro -FastTrack™ 
2.0 mRNA Isolation kit (Life Technologies). This mRNA was used to construct a 
cDNA library using the CloneMiner™ cDNA kit (Life Technologies). The cDNA 
library comprised approximately 2.9 X 10 7 colony forming units, with an average 
insert size of 1.1 kb. From this library, 50 randomly chosen clones were sent for 
sequence analysis. Three genes represented by three distinct groups of sequences 
containing repeating (Gly-Xaa-Yaa) n sequences were found in 20 of the clones and 
these were termed SFC A, B and C (deposited in Genbank as KF534808, KF534809 
and KF534807). Other sequences contained vector only sequences with the exception 
of five, which were found as single occurrences, with possible identities determined 
from database searches as peptidase, esterase, actin, RNA helicase and translation 
elongation factor. SfC A and SfC B were represented by two sequences differing by 8 
and 6 single nucleotide polymorphisms respectively, suggesting the presence of two 
separate alleles. No variations were seen within the SfC C group of sequences. 

Mass spectrometry. The protein bands on SDS-PAGE gels were cut out with a razor 
and analyzed as described previously 32 . The proteins were digested with trypsin, and the 
resultant peptides were analysed by liquid chromatography- tandem mass spectrometry 
on an Agilent LC/MSD Trap XCT spectrometer. Agilent SpectrumMill software was 
used to compare the peptides to both the sequences obtained from a cDNA library of 
the sawfly silk gland and to the entire NCBI database of protein sequences. 

Protein expression and purification. Full length cDNA clones from each N. 
oligospilus collagen types (SfC A-C) were PCR amplified using oligonucleotides 
designed against the 5' and 3' cDNA sequences and with appropriate restriction 
enzyme sites for transfer into the pET14b vector (Novagen) behind the His tag. The 
amplicons were purified, digested with the restriction enzymes, cloned into pET14b 
vectors (Novagen) and the sequences verified by DNA sequencing of the inserts. 
Constructs with the correct sequence were used for expression in E. coli Rossetta 2 
(DE3) cells (Novagen) using Overnight express media (Novagen) at 20°C for 48 hrs. 
Alternatively, sequences were modified at either end using the Quickchange Site- 
directed Mutagenesis Kit (Agilent Technologies) according to the manufacturer's 
instructions, to allow transfer into the pDONR 222 vector (Invitrogen). In order to 
increase expression of SfC C, the V-domain from S. pyogenes Scl2 gene, which is a 
registration and triple-helix promoting sequence, was inserted at the N-terminal of 
the construct. Sequences were then inserted into pCold expression vectors (Takara 
Bio Inc., Japan), then transformed into competent E. coli BL21 cells for expression as 
required. Transformed cells were grown on auto -induction media (Novagen) at 20°C 
or 2YT media (16 g.L -1 Trypton; 5 g.L -1 yeast extract, 5 g.L -1 NaCl) containing 
0.1 mg/ml ampicillin at 37°C in shaker flasks for 7 h, cooled to 25°C, induced with 
1 mM IPTG, grown for a further 10 h, and then a further 16 h at 15°C. The cells were 
harvested by centrifugation (3000 g for 30 min). 

Cells were lysed using Bugbuster and expression levels was assessed using SDS 
PAGE. For circular dichroism, cells were lysed by sonication in 40 mM sodium 
phosphate buffer pH 8.0, and the cell lysate clarified by centrifugation (20,000 g for 
40 min) and the clear supernatant retained. The expressed silk proteins were purified 
by absorbing the clarified lysates on an IMAC HyperCel™ column (Pall), with elution 
by stepwise increments up to 500 mM imidazole, adjusted to pH 8.0 with HC1. Cross- 
flow filtration was used to lower the salt content and to concentrate the protein 
solution. Final purification was by gel permeation chromatography on a HiPrep 
Sephacryl™ S-200 column (GE Healthcare). The individual triple-helical collagen 
segments were prepared by digestion of the proteins with 0. 1 mg/ml pepsin in 50 mM 
acetic acid. Purity of all products was assessed by SDS-PAGE. 

Circular dichroism spectroscopy. Circular dichroism spectra were collected for 
pepsin treated silk protein samples in 50 mM acetic acid using 1 mm path length cells 
in a JASCO J-815 instrument. 
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